-
-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for explicit test_dataset definition for evals #786
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, our data pipeline is getting a bit complex. I had a bit of hard time following the flow
8f11779
to
3bcdab4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this config to Readme would be helpful
feed723
to
fb72cb5
Compare
This will address #875 |
I think will need to add to doc about this and also, whether it would be appropriate to hardcode the |
@NanoCode012 What do you mean about hardcoding? |
fb72cb5
to
4e5da2a
Compare
Do you have an example or documentation of how this can be used? |
I was trying to reserve engineer how it's supposed to work, and maybe there's a bug here: dataset, prompters = load_tokenized_prepared_datasets(
tokenizer, cfg, default_dataset_prepared_path
) Shouldn't you pass |
first pass for supporting different datasets for evals rather than splitting the test dataset.